-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
207 rollup compaction simplified #232
207 rollup compaction simplified #232
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main-1.0.0 #232 +/- ##
==============================================
- Coverage 92.00% 90.86% -1.15%
==============================================
Files 88 93 +5
Lines 2214 2386 +172
Branches 168 178 +10
==============================================
+ Hits 2037 2168 +131
- Misses 177 218 +41 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of files changes, most of them small refactoring, but it is a bit hard to review like this. But the central class seems well written and the logic the one discussed. I only have 1 doubt about a test.
src/test/scala/io/qbeast/spark/utils/QbeastFilterPushdownTest.scala
Outdated
Show resolved
Hide resolved
We are finally merging the rollup!! Great job!! @alexeiakimov |
Description
The present PR provides initial implementation for indexing and optimization based on the rollup compaction as it is proposed in the #207. The most important changes are
AddFile
has changed to support multiple blocks.replicated
flags of its blocks and id not persisted in the DeltaLog metadata anymore.Type of change
This PR introduces incompatible changes in the metadata format for both individual index files and for the DeltaLog as well. The tables written with the previous version cannot be read by the present version, and vice versa. However the existing tables can be converted (int theory) to the new format by transforming the AddFile tags and DeltaLog metadata without touching the actual files.
How Has This Been Tested? (Optional)
The correctness tests are provided in the project code. Also @Jiaweihu08 did some extra performance test comparing the proposed version with the previous one and pure Delta.